Soft Discretization to Enhance the Continuous Decision Tree Induction*
نویسندگان
چکیده
Decision tree induction has been widely used to generate classifiers from training data through a process of recursively splitting the data space. In the case of training on continuous-valued data, the associated attributes must be discretized in advance or during the learning process. The commonly used method is to partition the attribute range into two or several intervals using a single or a set of cut points. One inherent disadvantage in these methods is that the use of sharp (crisp) cut points makes the induced decision trees sensitive to noise. To overcome this problem this paper presents an alternative method, called soft discretization, based on fuzzy set theory. As opposed to a classical decision tree, which gives only one class as the end result, the soft discretization based decision tree associates a set of possibilities to several or all classes for an unknown object. As a result, even if uncertainties existed in the object, the decision tree would not give a completely wrong result, but a set of possibility values. This approach has been successfully applied to an industrial problem to monitor a typical machining process. Experimental results showed that, by using soft discretization, better classification accuracy has been obtained in both training and testing than classical decision tree, which suggest that the robustness of decision trees could be improved by means of soft discretization.
منابع مشابه
On Exploring Soft Discretization of Continuous Attributes
Searching for a binary partition of attribute domains is an important task in data mining. It is present in both decision tree construction and discretization. The most important advantages of decision tree methods are compactness and clearness of knowledge representation as well as high accuracy of classification. Decision tree algorithms also have some drawbacks. In cases of large data tables...
متن کاملA Decision Boundary based Discretization Technique using Resampling
Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes are applied to subsets of the population as th...
متن کاملA Soft Decision Tree
Searching for binary partition of attribute domains is an important task in Data Mining, particularly in decision tree methods. The most important advantage of decision tree methods are based on compactness and clearness of presented knowledge and high accuracy of classification. In case of large data tables, the existing decision tree induction methods often show to be inefficient in both comp...
متن کاملSome Enhencements of Decision Tree Bagging
This paper investigates enhancements of decision tree bagging which mainly aim at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of continuous attributes, tree pruning, and sampling schemes. A very simple discretization procedure is proposed, resulting in a dramatic speedup without signiicant decrease in accuracy. Then a new method...
متن کاملInvestigation and Reduction of Discretization Variance in Decision Tree Induction
This paper focuses on the variance introduced by the dis-cretization techniques used to handle continuous attributes in decision tree induction. Diierent discretization procedures are rst studied empirically , then means to reduce the discretization variance are proposed. The experiment shows that discretization variance is large and that it is possible to reduce it signiicantly without notable...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001